Performance of a SCFG-Based Language Model with Training Data Sets of Increasing Size
نویسندگان
چکیده
In this paper, a hybrid language model which combines a word-based n-gram and a category-based Stochastic Context-Free Grammar (SCFG) is evaluated for training data sets of increasing size. Different estimation algorithms for learning SCFGs in General Format and in Chomsky Normal Form are considered. Experiments on the UPenn Treebank corpus are reported. These experiments have been carried out in terms of the test set perplexity and the word error rate in a speech recognition experiment.
منابع مشابه
MAN-MACHINE INTERACTION SYSTEM FOR SUBJECT INDEPENDENT SIGN LANGUAGE RECOGNITION USING FUZZY HIDDEN MARKOV MODEL
Sign language recognition has spawned more and more interest in human–computer interaction society. The major challenge that SLR recognition faces now is developing methods that will scale well with increasing vocabulary size with a limited set of training data for the signer independent application. The automatic SLR based on hidden Markov models (HMMs) is very sensitive to gesture's shape inf...
متن کاملShannon’s Entropy of The Stochastic Context-Free Grammar and an Application to RNA Secondary Structure Modeling
Stochastic context-free grammars (SCFG) have been used in RNA Secondary structure modeling. An SCFG consists of a set of grammar rules with probability for each. Given a grammar design, finding the best set of probabilities that yield optimum performance can be challenging. Although current Expectation Maximization (EM) MaximumLikelihood (ML)-based model training approaches have been effective,...
متن کاملEstimation of the mean grain size of mechanically induced Hydroxyapatite based bioceramics via artificial neural network
This study focuses on the estimation of the mean grain size of mechanically induced Hydroxyapatite (HA) through the artificial neural network (ANN) model. The mean grain size of HA and HA based nanocomposites at different milling parameters were obtained from previous studies. The data were trained and tested by the neural network modeling. Accordingly, all data (55 sets) were based on the mecha...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005